fix(datasets): increase create version request timeout#389
Conversation
gradient/commands/datasets.py
Outdated
| headers.update({'Content-Size': '0'}) | ||
| r = session.put(url, data='', headers=headers, timeout=5) | ||
| # for files under 15MB | ||
| elif size <= (15e6): |
There was a problem hiding this comment.
to 15MB from 500MB
gradient/commands/datasets.py
Outdated
| elif size <= (15e6): | ||
| with open(path, 'rb') as f: | ||
| r = session.put( | ||
| url, data=f, headers=headers, timeout=300) |
There was a problem hiding this comment.
Increase timeout from 5 seconds to 5 minutes
gradient/commands/datasets.py
Outdated
| presigned_url, | ||
| data=chunk, | ||
| headers=headers, | ||
| timeout=300) |
There was a problem hiding this comment.
Increase timeout from 5 seconds to 5 minutes
There was a problem hiding this comment.
maybe this timeout can be pulled out too... 🤷
| part_res = session.put( | ||
| presigned_url, | ||
| data=chunk, | ||
| headers=headers, |
There was a problem hiding this comment.
Add headers
| # console! Which again, jank and noisy, but arguably | ||
| # better than a task sitting forever, never either | ||
| # completing or emitting an error message. | ||
| print( |
There was a problem hiding this comment.
Report every chunk
There was a problem hiding this comment.
This isn't too noisy now that we removed branch around it?
There was a problem hiding this comment.
No, I don't think the previous one was noisy enough honestly. It feels like nothing is happening. In general, we just need a better progress bar.
| content_type=result['mimetype'], | ||
| dataset_version_id=dataset_version_id, | ||
| key=result['key']) | ||
| with requests.Session() as session: |
There was a problem hiding this comment.
Use connection pooling from urllib3. Previously we weren't utilizing this feature, only the context.
gradient/commands/datasets.py
Outdated
| # We can dynamically assign a larger part size if needed, | ||
| # but for the majority of use cases we should be fine | ||
| # as-is | ||
| part_minsize = int(15e6) |
There was a problem hiding this comment.
might be worth moving this byte size to a 'constant'. i see it referenced above too.
|
🎉 This PR is included in version 2.0.5 🎉 The release is available on GitHub release Your semantic-release bot 📦🚀 |
| # we +2 the number of parts since we're doing floor | ||
| # division, which will cut off any trailing part | ||
| # less than the part_minsize, AND we want to 1-index | ||
| # our range to match what AWS expects for part | ||
| # numbers |
There was a problem hiding this comment.
Is this true? Shouldn't you use ceil? I have a gradient version create and gradient files put where it crashes when there is a file that evenly divides the 15mb chunk size (75mb). I suspect that it's trying to read an extra chunk that doesn't exist. The progress bar says 90mb/75mb when it crashes. Does that make sense? I could be misreading things.
Also I think you mean to say that you 1+index because you start counting from 1 and you need to correct for range. Not entirely sure. Could you check this out?
There was a problem hiding this comment.
Looks like this was a known bug in a previous PR but intentionally left in. 🥇 The effect of reading past the end of the file was not predicted though.
After some digging, it turns out that larger files were timing out due to the default timeout of 5 seconds. Since the errors are happening inside of the worker pool, they are never reported to the user.
requests.Session()context so that connection pooling is used and connections to a given host are maintained between requests. This should yield slightly better performance for parallel uploads.Screenshots
Screen.Recording.2022-06-21.at.8.25.35.PM.mov